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ABSTRACT 

This  document  provides  an  introduction  and  overview 
for  the  document  series  TM-4652,  Analysis  and 
Development  of  che  Vicens-Reddy  Speech  Recognition 
System. 
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1.  INTRODUCTION 

For  about  a  year,  SDC  has  been  involved  in  a  program  of  de\elopment  cf  voice 
communication  with  the  computer.  This  has  included  studies  of  both  automatic 
speech  recognition  and  speech  synthesis.  Initial  investigations  have  indicated 
that  one  of  the  most  successful  speech  recognition  systems  was  developed  by 
Pierre  Vicens  (under  the  direction  of  Raj  Reddy)  on  the  PDP-10  computer  at 
Stanford  University.  It  was  decided  to  implement  this  system  or  the  IBM  360/67 
at  SDC  to  provide  a  base  for  further  development. 

This  particular  system  is  unique  in  the  sense  that  it  approaches  the  problem  of 
speech  recognition  as  a  whole,  rather  than  treating  particular  aspects  of  the 
problem  as  in  previous  attempts.  For  example,  where  earlier  systems  treated 
only  segmentation  of  sneech  into  phoneme  groups,  or  detected  phonemes  in  a  given 
context,  the  Vicens-Reddy  system  processes  the  incoming  speech  signal,  applies 
heuristics  to  segment  the  signal  and  to  identify  phoneme-like  units  and  then 
uses  the  total  phonemic  pattern  to  recognize  an  entry  in  the  lexicon. 

The  Vicens-Reddy  system  can  be  divided  into  six  parts:  (1)  hardware  preprocessing, 
(2)  software  preprocessing,  (3)  segmentation,  (4)  recognition,  (5)  lexicon  develop¬ 
ment,  and  (6)  lexicon  usage.  Preprocessing  involves  digitizing  and  normalizing  the 
raw  voice  input.  The  output  of  the  preprocessing  is  an  array,  called  the  0-matrix, 
whose  rows  are  the  amplitudes  and  zero-crossing  counts  of  the  raw  voice  input 
over  10  ms.  time  periods.*  Segmentation  applies  heuristics  to  combine  the  minimal 
segments  of  the  above  Q-matrix  intJ  larger  transitional  or  sustained  segments  of 
a  new  array,  called  the  P-matrix.  Recognition  applies  heuristics  to  the  P-matrix 
in  classifying  sustained  segments  into  phoneme-like  groups  (fricative,  vowel,  stop, 
nasal,  consonant  or  burst)  and  produces  the  R-matrix  or  xeature  matrix.  The 
lexicon  development  involves  the  addition  of  the  feature  matrix  from  the  previous 
recognition  process  into  the  lexicon.  In  lexicon  usage,  the  feature  matrix  from 
the  recognition  process  is  used  to  achieve  the  best  match  in  the  lexicon. 


A  10  ms.  time  period  will  be  referred  to  as  a  minimal  segment  throughout. 
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Some  amount  of  speech  research  at  SDC  will  be  based  on  this  program.  Such 
research  require?  detailed  explanations  of  the  various  heuristics  used  in  the 
program.  However,  the  available  documentation  is  not  complete.  This  series  of 
documents  will  begin  with  a  detailed  description  of  the  Vicens-Reddy  syutem, 
along  with  explanations  for  a  large  number  of  previously  unexplained  heuristics 
used  in  the  program.  Later  documents  will  describe  various  modifications  to 
the  system. 
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